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Abstract 

/« f/i/i paper we introduce a novel gossiping primitive 
to support privacy preserving data analytics (PPDA). In 
contrast to existing computational PPDA primitives such 
as secure multiparty computation and data randomization 
based approaches, the proposed primitive "anonymous gos- 
siping " is a communication primitive for privacy preserving 
personalized information aggregation complementing such 
traditional computational analytics. We realize this novel 
primitive by composing existing gossiping mechanisms for 
peer sampling & information aggregation and onion rout- 
ing technique for establishing anonymous communication. 
This is more an 'ideas' paper, rather than providing con- 
crete and quantified results. 

Keywords: privacy, anonymity, aggregation, gossip algo- 
rithms 

"It is perfectly monstrous the way people go about nowadays saying 
things against one, behind one's back, that are absolutely and entirely 
true." — Oscar Wilde 

1 Introduction 

Information aggregation and mining is often used to 
obtain collective intelligence and generate a panoramic 
(macroscopic) view of a system or to device useful rec- 
ommendation mechanisms. An interesting niche which has 
been studied for the last decade is that of privacy preserving 
data mining (PPDM). The essential idea, to quote the sem- 
inal paper on PPDM [ 1 1, is "Since the primary task in data 
mining is the development of models about aggregated data, 
can we develop accurate models without access to precise 
information in individual data records?" . The early works 
on PPDM were based on random perturbation of informa- 
tion. Since then, a new class of PPDM based on secure 
multiparty computation |14| has also evolved. The trade- 
offs between the two approaches are mainly on accuracy 
and computational complexity & scalability. Research on 
both these individual families of privacy preserving data an- 
alytics (PPDA) as well as hybridized solutions continue in 
full steam. Privacy preserving data mining in P2P environ- 



ments [4] has also gained considerable attention in recent 
years. 

In this paper we address an orthogonal question. Can we 
facilitate collaborative data analytics among users without 
disclosing the identity of who are participating and con- 
tributing the data ? Computational PPDAs do not provide 
anonymity. Such privacy is important, say, when the ana- 
lytics is carried out for a specific subset of users with some 
shared characteristics, such that besides the privacy of the 
individual records, the users may be interested to even pre- 
serve privacy in terms of them having those characteristics. 

We propose a communication primitive (anonymous 
gossiping) which facilitates such privacy. Specifically, we 
adapt a well studied point-to-point anonymized communi- 
cation technique, onion routing 1 10. 8| to achieve it. Note 
that the actual data analytics itself however may be addi- 
tionally with or without preserving privacy of the individ- 
ual records. For example, for an anonymized paper sub- 
mission, it is ok that the reviewers can read the content of 
the paper, as long as they do not know who wrote the pa- 
per. It is in this sense that our mechanism compliments 
the existing computational primitives. Anonymous gossip- 
ing finds ready usage in emerging P2P applications such as 
user affinity based personalized decentralized search lfl3l [3l 
& personalized recommendation in decentralized online so- 
cial networks Q. 

While anonymous communication in p2p systems is an 
old and well studied problem, e.g., Freenet 0, the novelties 
of this paper are (i) defining a novel communication prim- 
itive for PPDA, and (ii) proposing a concrete way to do so 
by composing well studied existing building blocks. 

In this short paper, we limit ourselves to defining this 
new problem and sketching a first solution for the same. 
A more rigorous analysis and evaluation of the proposed 
mechanism's security, performance, overheads and subse- 
quent necessary optimization or exploration of alternatives 
are all issues for future study. 

In Section [2] we provide a more succinct description of 
the problem along with a sketch of the solution. We elabo- 



1 Epidemic information dissemination leveraging anonymous interac- 
tions in mobile ad hoc network has been studied in the past, and uses the 
same name 1 6 1, but what we do is completely unrelated. 



rate in detail our assumptions and notations in Section[3]be- 
fore providing the anonymous gossiping protocol in Section 
|4] We wrap up in Section[5]with concluding remarks high- 
lighting several interesting extensions and research prob- 
lems that present themselves from the current work. 

2 Problem statement & solution sketch 

We want to facilitate the following: 

1 . Allow user specific information (lets call it user pro- 
file) to be used to carry out any kind of personalized 
aggregation/clustering, etc. Such mechanisms can then 
be used for various personalized services such as rec- 
ommendation or query expansion (3] Q~3) • 

2. Ensure that an user can not be associated with hei0 pro- 
file by others, even while individual users benefit from 
personalization facilitated by analysis of information 
aggregated from other similar users. 

The basic outline of a potential solution to achieve the 
above mentioned objectives comprise of the following steps 
and building blocks: 

1 . Aggregation task delegation: Each user delegates the 
task of personalized aggregation to a proxy peer. 

2. Proxy peers interact among each other to carry out the 
aggregation task on behalf of the users. 

3. Ensure that the proxy peer is oblivious of the identity 
of the user(s) on whose behalf it carries out aggrega- 
tion task. This in turn will ensure users' privacy. This 
necessitates a mechanism for users to assign the task to 
a proxy without being identified, and also a mechanism 
for the proxy to still be able to deliver back the aggre- 
gated information to the original user without knowing 
who it is. 

Here we describe a mechanism (anonymous gossiping) 
to achieve the last point. How the aggregation task itself is 
carried out among the proxies is an orthogonal issue. This 
includes the issues of both how proxies interact among each 
other, and how they carry out the data analytics. Anony- 
mous gossiping is generic in that it can be applied to pro- 
vide user privacy while using arbitrary gossiping algorithms 
for information aggregation. 

Whether any peer is adequate to act as a proxy, or 
whether some other considerations such as trustworthiness, 
or betweenness in social graph (facilitating quicker aggre- 
gation) etc. need to be taken into account while delegating 
the task is ignored in the current work. 

2 For simplicity, we choose to use the feminine form to address the 
users, instead of using his/her/its on every occasion. 



3 Assumptions and notations 

Our solution relies on the following assumptions and ex- 
isting primitives. 

1 . Users form and participate in an overlay. This overlay 
may be a classical unstructured network or a semantic 
or social overlay. 

2. Users use public key as their logical identifier in the 
system. However, there is no need for a public key 
infrastructure (PKI) since we are not trying to establish 
if a specific public key belongs to a specific user or not. 
Public key is used so that anything signed with it can 
be decrypted by only its corresponding private key. 

3. A random set of peers (public keys and corresponding 
contact information such as IP address/port number) 
can be obtained without an adversary knowing who 
obtained a specific information. This assumption is 
important, otherwise, if one can determine who all ob- 
tained a specific peer ID from the sampling service, 
then it reduces the degree of anonymity. 

A random set of peers can readily be obtained using 
gossip based peer sampling lfl2l . We argue that peers 
who participate in the process of peer sampling for a 
relatively long time would encounter sufficiently large 
number of other peers to mitigate any set intersection 
analysis by an adversary. 

4. Proxies delegated by users of similar profiles can dis- 
cover each other and carry out the aggregation task. 
Note that this last assumption is needed for the aggre- 
gation task, and is orthogonal to the anonymity issues. 
Gossip based mechanism like T-man [jTTI or variants 
|[T6l can be applied for this. Note also that the gossip- 
ing overheads for the various tasks like peer sampling 
and information aggregation can be amortized. 

We use the following notations while detailing the 
anonymous gossiping mechanism. 

a; Public key and ID of peer i. 

Ei(.) Encryption of message with public key of peer ai, so that 
only she can decrypt it. 

$i Random set of peers that peer ai has obtained somehow. 

Hi A symmetric en/de-cryption key created by peer ai. 

Message encrypted with the key Hi. 

7Ti Profile of peer ai. The records of the profile may/not need 
themselves to be perturbed or obfuscated for privacy preserv- 
ing data analytics 1 1 1. That is however an orthogonal issue. 

n'i Aggregated/personalized clustered information corresponding 
to profile Hi . 



4 Anonymous gossiping 

There are three logical phases for anonymous gossiping: 
(phase I) aggregation task delegation to a proxy in an anony- 
mous manner, (phase II) the proxies carrying out the dele- 
gated aggregation tasks, and finally (phase III) obtaining the 
results back from the delegate in an anonymous manner. 

The aggregation task (phase II) is an interesting problem 
on its own right. Either existing solutions ll9l l2l [T3ll or new 
ones may be applied for it. We consider it as a black-box 
and focus on the other two phases as described next. 

In the description below we consider a scenario where 
Alice is using anonymous gossiping to carry out privacy 
preserved personalized aggregation. 

4.1 Delegation of aggregation task 

Aggregation task is delegated to a proxy as follows: 

• Obtain <£> Alice, a moderately large and random subset 
of peers in the system, e.g., using peer-sampling 1121 . 

• Determine the candidate a p for task delegation where 

a p S $ Alice- 
Alice needs to send the message msg = 
(tt Alice ) « Alice) containing her profile and a sym- 
metric key to this delegate anonymously. 

Note that the message contains the profile, but not 
Alice's identity. However, if the profile itself con- 
tains identity revealing details (such as search terms 
from 'ego search') then our approach can not provide 
anonymity. Generally speaking, possible obfuscation 
of the records using traditional PPDM techniques 0] 
may additionally be necessary. 

• Choose k other peers a, £ $ Alice, where k is a random 
integer chosen uniformly from some predefined range, 
say [5, . . . , 20]. These peers will be used to form an 
onion route [ 8 , 10] between Alice and the delegate a p . 

• Send to peer a pi an onion encoded message 

, a Pk ), ...), a P2 ) 

When a peer a Pj receives an onion encoded message 
from ct Pj l , she can only decrypt the outermost layer 
with her own private key. Upon decryption, it ob- 
tains another encrypted message along with the iden- 
tity/address of the next node to which it should pass 
the same. Thus, intermediate nodes do not know how 
many nodes or who have already routed the message, 
nor the nodes who will subsequently do so. Each node 
only knows its immediate up & down-stream peers for 
an onion route. Given that the nodes were chosen at 



random by the source further reduces chances of col- 
lusion. The random choice of the length k of the onion 
route provides a further level of obfuscation and plau- 
sible deniability for Alice. It also provides robust- 
ness against small scale opportunistic collusion among 
some of the intermediate nodes to unravel the route re- 
quester's identity. Onion routing is robust against traf- 
fic snooping provided a minimal amount of ambient 
traffic is present in the system IH31 . 

Once the designated delegate a p obtains the msg = 
(ir Alice, k Alice), it can carry out the Phase-II task of aggre- 
gation and analytics using ir Alice to compile n' Alice . 

4.2 Collecting aggregated information 

One option to collect the aggregated information is for 
Alice to probe the delegate, and pull the response along a 
(new) onion route. Note that Alice should not reveal her 
identity to the delegate, so the delegate can not know which 
node to send the response to. The response KAUcei^AUce) 
encrypted with the symmetric key originally sent by Alice, 
and digitally signed by the delegate a p may be sent up- 
stream along the onion route (since each node knows the 
immediate up & down-stream neighbors of a route without 
the delegate having to know specifically the destination. 

There are however some potential drawbacks with such 
a pull based approach. Firstly, since Alice does not know 
if and when the delegate has completed computation of 
7r Alice srie ma y nave t0 initiate pull on multiple occasions. 
Secondly, and more fundamentally, a passive attacker can 
monitor the network traffic (sniffing for k Alice Waucc)) anc ^ 
detect the terminal point. This can be alleviated if the mes- 
sage is encrypted by the intermediate nodes at every hop 
using the public key of the immediate node upstream. 

An alternative to pull is a blind gossip based push mech- 
anism. Whenever proxy a p needs to send the aggregated 
information 7r^ iice back to Alice, she can just flood the 
network with corresponding Tt' AUce , a^, KAUce^ Alice) dig- 
itally signed with its private key. On receiving any such 
flooded message originating from a p , Alice can still deter- 
mine whether the message is indeed meant for her or not us- 
ing KAiice(^ Alice) even if a p is also proxy for other peers. 
Alice should continue forwarding the message in all cases, 
so that no one monitoring the network can identify her as 
the intended destination for the information. A possible op- 
timization during flooding is that each peer propagates the 
message to only peers it has obtained onion routed messages 
within a past time window. That will ensure, in absence of 
churn, that the source (Alice) gets the aggregated informa- 
tion, while avoiding a larger scale flooding. 



5 Concluding remarks 

We have defined a new communication primitive, 
namely anonymous gossiping, which can support privacy 
preserving data aggregation and analytics complementing 
traditional computational primitives for PPDA based on 
randomization or secure multi-party computation. We have 
provided a rough but concrete sketch of one way to real- 
ize anonymous gossiping by composing existing techniques 
like peer sampling and onion routing. Use of such mature 
techniques is expected to facilitate a quick implementation 
of anonymous gossiping. 

Additionally, this paper opens interesting avenues span- 
ning algorithms, implementation as well as analysis which 
forms our ongoing work. Exploration of new, more effi- 
cient, robust and churn resilient algorithms for anonymous 
gossiping is one direction. Clever implementation, partic- 
ularly amortizing the various gossiping overheads (needed 
during peer-sampling and aggregation) provide nice sys- 
tems design opportunities. Threat analysis including quan- 
tifying the trade-offs between the degree of anonymity and 
the time and messaging overheads in the peer-sampling pro- 
cess is a third frontier. 
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